This code models the risk of a randomly selected set of 100 firms stocks. The time period over which the firms are analyzed were for 2005 to 2010, and compared with values determined from 2000 to 2010.
The SAS code used to pull the data from the DSF data file will be provided. The data cleaning using SAS was stopped once the 100 firms were selected for each set of years. For the risk models, historical data was used to determine a starting point. As seen in the SAS code, the historical data was used from the prior year (in this case, 2004 and 1999).
For the first step, the data was imported from the csv files. These csv’s had the returns, date and firm permanent number for the random 100 firms pulled from the dsf files.
It was assumed that there was $1 million invested into each firm, leading to the assumption in the read file that there were equal weighted position in each firm. Therefore, a simple average of returns was used.
The data importing step was created using a function that takes a file name and reads the data in. This simplified the importing process for multiple files.
# import data
# returns are in %
# sum return percentages by day
# assumed equal investment weight in each firm
read_func <- function(value, I)
{
filename = paste(value, ".csv", sep="")
ret = read.csv(filename, header=TRUE)
firms = ret[["PERMNO"]]
count = length(unique(firms))
port = count*I
ret_sum = aggregate(.~DATE, data=ret, FUN=sum)
ret_sum = ret_sum[order(as.Date(ret_sum$DATE, "%m/%d/%Y"), decreasing=FALSE),]
ret_date = unique(ret_sum[["DATE"]])
keeps = c("RET","DATE")
ret_sum = ret_sum[keeps]
# gets average
i = 1
while(i < nrow(ret_sum))
{
ret_sum[i,"RET"] = ret_sum[i,"RET"]/count
i = i + 1
}
data = list("portfolio" = port, "returns" = ret_sum)
return(data)
}
invest = 1000000 # investment per firm
file1 = "returns_main"
file2 = "returns_comp"
file3 = "returns_main_hist"
file4 = "returns_comp_hist"
# reads files, gets list of returns and value
data_m = read_func(file1,invest)
data_c = read_func(file2,invest)
data_mh = read_func(file3,invest)
data_ch = read_func(file4,invest)
Once the data was pulled from the csv’s, returns were graphed to view the historical distributions with a normal distribution overlay. This was used to test that the return data was imported properly, and view the normality of the returns.
The histogram also served as a sanity check for the VaR. If the VaR value looked unreasonable (too large, for example), it could be quickly checked against the distribution of returns.
# histogram function with normal dist overlay
vec_histogram <- function(x, names)
{
h = hist(x, breaks=(length(x)/50), col="red", xlab=names$xlabel,
main=names$title)
xfit = seq(min(x),max(x),length=40)
yfit = dnorm(xfit, mean=mean(x), sd=sd(x))
yfit = yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
}
# lists with labels for histograms
hist_label_m = list("title" = "Main Period Returns w/ Normal Curve",
"xlabel" = "Daily Returns")
hist_label_c = list("title" = "Comparison Period Returns w/ Normal Curve",
"xlabel" = "Daily Returns")
# write histograms for historical returns
vec_histogram(data_m$returns[["RET"]], hist_label_m)
vec_histogram(data_c$returns[["RET"]], hist_label_c)
The returns distributions show how the firms performed for each of the periods. The main takeaway from each of the periods is that there is no return or loss above 5% for either period. Since the periods overlap in this case, it implies that the largest losses and returns likely occur somewhere between the start of 2005 and the end of 2010.
Specific to these firms, the mean is likely set above 0. The main insight to take away from the distribution is that the VaR and expected shortfall should be expected to be smaller than 5% for both periods.
VaR was the first of the calculations used to determine the risk of the randomly chosen firms. The $VaR was calculated simultaneously in the same function. Both of these risk metrics require a confidence interval with which to measure the downside risk. For this particular case, 95% confidence was chosen.
The function also determines the expected shortfall of the positions over the time period. The value is conditional on distribution of the normal variable being below the VaR. This can be used to determine the magnitude of the worst case scenario of the positions held in the firm over the time period.
#function to calculate var
var_calc <- function(returns,port,a)
{
r_vec = returns[["RET"]]
vol = sd(r_vec)
cumdist = qnorm(1-a,0,1)
var = abs(quantile(r_vec,1-a))
dol_var = abs(quantile(r_vec,1-a)*port)
exp_short = vol*dnorm(cumdist)/(1-a)
data = list("VaR"=var, "$VaR"=dol_var,
"Expected_Shortfall" = exp_short)
return(data)
}
# confidence interval for var
conf = 0.95
# var calculations
var_m = var_calc(data_m$returns,data_m$portfolio,conf)
var_c = var_calc(data_c$returns,data_c$portfolio,conf)
# prints out var Variables
var_m
## $VaR
## 5%
## 0.01484422
##
## $`$VaR`
## 5%
## 1484422
##
## $Expected_Shortfall
## [1] 0.02193938
var_c
## $VaR
## 5%
## 0.01260002
##
## $`$VaR`
## 5%
## 1260002
##
## $Expected_Shortfall
## [1] 0.01916717
The values outputed by the function give insight into the two time periods. For one, the VaR for 2005-2010 was 1.4%; 2 tenths of a percent larger than that of the 2000-2010 VaR. This implies a greater volatility for the shorter time period. Since both of these samples capture the entirety of the 2008 Financial Crisis, it should be expected that the the variance of the 2005 sample is impacted more due to the shorter time period over which it takes place.
Similarly, the $VaR is greater for the 2005 period. The $VaR is simply the VaR scaled with the size of the investment. Since the investment is identical between both periods, the $VaR is influenced by the same forces as the VaR.
The expected shortfall (ES) for each time period shows how the lower bound tails of the returns behave. ES for the 2005 period was 2.2%. This tells us that the loss on a given day within the time period will be 2.2%, given the value of a loss on that is greater than the VaR.
The ES for the comparison period is 1.9%. That makes the difference between VaR and ES for each period less than 1%. The relatively small difference between values shows that the returns over the period were not so extreme as to make a large jump in losses should the VaR be passed. This is reinforced by the distribution returns shown above. There are few outliers on the lower bound tails.
Both of the functions below capture the day by day swing of the variance, VaR and ES of the periods.
The first function is the calculation of the RiskMetrics model (also known as the “exponential smoother”). This solves for the variance of the next day using todays variance and returns. The coefficient of the model is assumed to be 0.94 by the RiskMetrics model.
The function uses the initial variance estimate as the variance across all assets for the previous ten days. In order to find the previous ten days, a historical data set was imported (shown in code above). A loop is used to iterate through the return values for the time period to determine variance values for the RiskMetrics model.
# risk metrics one day modeling
oneday_f <- function(returns,hist_returns,a)
{
# initial variance using historical data
# variance based off of previous 10 days
hist = hist_returns[["RET"]]
variance_0 = var(hist[length(hist)-10:length(hist)])
# variables for while loop
lamda = 0.94
i = 1
variance = c(0)
var = c(0)
exp_short= c(0)
cumdist = qnorm(1-a,0,1)
while(i <= nrow(returns))
{
variance_1 = lamda*variance_0 + (1-lamda)*((returns[i,"RET"])^2)
variance[i] = variance_1
var[i] = -1*sqrt(variance_1)*cumdist
exp_short[i] = sqrt(variance_1)*dnorm(cumdist)/(1-a)
variance_0 = variance_1
i = i + 1
}
returns$variance = variance
returns$VaR = var
returns$ExpShort = exp_short
return(returns)
}
oneday_m = oneday_f(data_m$returns,data_mh$returns,conf)
oneday_c = oneday_f(data_c$returns,data_ch$returns,conf)
The second function uses the GARCH model. The function is structurally similar in its estimation of the future variance, but adds a drift constant. The function also uses a GARCH library command to solve for the omega, alpha and beta constants. The command uses a distribution of returns to solve for the coefficient values used in the GARCH variance estimation. It also outputs the model statistics and coefficient values.
The function also uses a historical data set to determine the initial variance of the model. There is a similar loop structure to iterate through returns and determine the variance of the GARCH model.
# garch model
garch_f <- function(returns,hist_returns,a)
{
# initial variance using historical data
# variance based off of previous 10 days
hist = hist_returns[["RET"]]
variance_0 = var(hist[length(hist)-10:length(hist)])
# function to solve for garch model variables
x.g = garchFit(~garch(1,1),returns[["RET"]])
# summary(x.g)
coef(x.g)
# variables for loop and garch model
i = 1
variance = c(0)
var = c(0)
exp_short= c(0)
cumdist = qnorm(1-a,0,1)
alpha = coef(x.g)[3]
beta = coef(x.g)[4]
omega = coef(x.g)[2]
while(i <= nrow(returns))
{
variance_1 = omega + beta*variance_0 + alpha*((returns[i,"RET"])^2)
variance[i] = variance_1
var[i] = -1*sqrt(variance_1)*cumdist
exp_short[i] = sqrt(variance_1)*dnorm(cumdist)/(1-a)
variance_0 = variance_1
i = i + 1
}
returns$VaR = var
returns$ExpShort = exp_short
returns$variance = variance
return(returns)
}
The following two commands run the above listed function. The output is the model statistics from the GARCH package solver, and the coefficients of alpha, beta, and omega.
garch_m = garch_f(data_m$returns,data_mh$returns,conf)
##
## Series Initialization:
## ARMA Model: arma
## Formula Mean: ~ arma(0, 0)
## GARCH Model: garch
## Formula Variance: ~ garch(1, 1)
## ARMA Order: 0 0
## Max ARMA Order: 0
## GARCH Order: 1 1
## Max GARCH Order: 1
## Maximum Order: 1
## Conditional Dist: norm
## h.start: 2
## llh.start: 1
## Length of Series: 1511
## Recursion Init: mci
## Series Scale: 0.01063618
##
## Parameter Initialization:
## Initial Parameters: $params
## Limits of Transformations: $U, $V
## Which Parameters are Fixed? $includes
## Parameter Matrix:
## U V params includes
## mu -0.35009853 0.3500985 0.03500985 TRUE
## omega 0.00000100 100.0000000 0.10000000 TRUE
## alpha1 0.00000001 1.0000000 0.10000000 TRUE
## gamma1 -0.99999999 1.0000000 0.10000000 FALSE
## beta1 0.00000001 1.0000000 0.80000000 TRUE
## delta 0.00000000 2.0000000 2.00000000 FALSE
## skew 0.10000000 10.0000000 1.00000000 FALSE
## shape 1.00000000 10.0000000 4.00000000 FALSE
## Index List of Parameters to be Optimized:
## mu omega alpha1 beta1
## 1 2 3 5
## Persistence: 0.9
##
##
## --- START OF TRACE ---
## Selected Algorithm: nlminb
##
## R coded nlminb Solver:
##
## 0: 1865.0119: 0.0350099 0.100000 0.100000 0.800000
## 1: 1842.9892: 0.0350110 0.0770096 0.101110 0.790622
## 2: 1824.9658: 0.0350187 0.0418833 0.145217 0.811497
## 3: 1824.1289: 0.0350200 0.0494928 0.146435 0.816261
## 4: 1822.3386: 0.0350279 0.0442568 0.140919 0.821178
## 5: 1821.9951: 0.0350353 0.0397241 0.135190 0.826533
## 6: 1819.7730: 0.0350516 0.0384134 0.125818 0.841978
## 7: 1814.4681: 0.0351172 0.0198608 0.0843122 0.898394
## 8: 1813.8757: 0.0351174 0.0220252 0.0842906 0.899198
## 9: 1813.5373: 0.0351166 0.0211706 0.0823290 0.900067
## 10: 1813.1985: 0.0351111 0.0220507 0.0786251 0.902677
## 11: 1812.4546: 0.0350897 0.0193407 0.0722761 0.908783
## 12: 1811.3565: 0.0350713 0.0162209 0.0605793 0.924120
## 13: 1811.3424: 0.0353443 0.0142101 0.0582107 0.927689
## 14: 1811.1537: 0.0354996 0.0146580 0.0580137 0.928632
## 15: 1810.9223: 0.0392140 0.0131254 0.0515749 0.935677
## 16: 1810.8907: 0.0403230 0.0123075 0.0511664 0.937387
## 17: 1810.8718: 0.0414337 0.0126024 0.0515373 0.936794
## 18: 1810.8575: 0.0425443 0.0127560 0.0519354 0.935957
## 19: 1810.8415: 0.0450621 0.0130104 0.0523940 0.935301
## 20: 1810.8412: 0.0460681 0.0127466 0.0519598 0.936076
## 21: 1810.8402: 0.0458629 0.0129067 0.0521223 0.935667
## 22: 1810.8402: 0.0457987 0.0129011 0.0521521 0.935659
## 23: 1810.8402: 0.0458141 0.0128997 0.0521423 0.935668
## 24: 1810.8402: 0.0458140 0.0128998 0.0521425 0.935668
##
## Final Estimate of the Negative LLH:
## LLH: -5054.38 norm LLH: -3.345056
## mu omega alpha1 beta1
## 4.872854e-04 1.459328e-06 5.214250e-02 9.356679e-01
##
## R-optimhess Difference Approximated Hessian Matrix:
## mu omega alpha1 beta1
## mu -2.619174e+07 1.782904e+09 -3.768674e+04 -1.932095e+03
## omega 1.782904e+09 -6.753649e+13 -1.901585e+09 -3.361636e+09
## alpha1 -3.768674e+04 -1.901585e+09 -1.161684e+05 -1.376488e+05
## beta1 -1.932095e+03 -3.361636e+09 -1.376488e+05 -2.023265e+05
## attr(,"time")
## Time difference of 0.02300191 secs
##
## --- END OF TRACE ---
##
##
## Time to Estimate Parameters:
## Time difference of 0.127007 secs
garch_c = garch_f(data_c$returns,data_ch$returns,conf)
##
## Series Initialization:
## ARMA Model: arma
## Formula Mean: ~ arma(0, 0)
## GARCH Model: garch
## Formula Variance: ~ garch(1, 1)
## ARMA Order: 0 0
## Max ARMA Order: 0
## GARCH Order: 1 1
## Max GARCH Order: 1
## Maximum Order: 1
## Conditional Dist: norm
## h.start: 2
## llh.start: 1
## Length of Series: 2767
## Recursion Init: mci
## Series Scale: 0.009292212
##
## Parameter Initialization:
## Initial Parameters: $params
## Limits of Transformations: $U, $V
## Which Parameters are Fixed? $includes
## Parameter Matrix:
## U V params includes
## mu -0.52371287 0.5237129 0.05237129 TRUE
## omega 0.00000100 100.0000000 0.10000000 TRUE
## alpha1 0.00000001 1.0000000 0.10000000 TRUE
## gamma1 -0.99999999 1.0000000 0.10000000 FALSE
## beta1 0.00000001 1.0000000 0.80000000 TRUE
## delta 0.00000000 2.0000000 2.00000000 FALSE
## skew 0.10000000 10.0000000 1.00000000 FALSE
## shape 1.00000000 10.0000000 4.00000000 FALSE
## Index List of Parameters to be Optimized:
## mu omega alpha1 beta1
## 1 2 3 5
## Persistence: 0.9
##
##
## --- START OF TRACE ---
## Selected Algorithm: nlminb
##
## R coded nlminb Solver:
##
## 0: 3501.3804: 0.0523713 0.100000 0.100000 0.800000
## 1: 3479.5557: 0.0523739 0.0822156 0.0991491 0.791907
## 2: 3460.4861: 0.0523911 0.0599593 0.133071 0.809285
## 3: 3459.0764: 0.0523934 0.0558959 0.131688 0.808259
## 4: 3458.4449: 0.0524014 0.0550814 0.131785 0.812593
## 5: 3457.3108: 0.0524192 0.0495481 0.127994 0.818322
## 6: 3455.8345: 0.0524430 0.0487598 0.125308 0.826681
## 7: 3451.4552: 0.0526124 0.0311618 0.104637 0.864537
## 8: 3447.3815: 0.0533981 0.0301004 0.0739624 0.896371
## 9: 3446.8269: 0.0535419 0.0251979 0.0685815 0.897626
## 10: 3444.5050: 0.0539021 0.0260827 0.0691857 0.901317
## 11: 3442.5940: 0.0549154 0.0194667 0.0562072 0.921493
## 12: 3442.5436: 0.0549161 0.0186038 0.0564906 0.921641
## 13: 3442.4175: 0.0549485 0.0188409 0.0568006 0.922198
## 14: 3442.3653: 0.0549901 0.0184133 0.0568240 0.922380
## 15: 3442.3195: 0.0550807 0.0183016 0.0570188 0.922969
## 16: 3442.2709: 0.0552709 0.0178050 0.0569714 0.923300
## 17: 3442.2417: 0.0554631 0.0178292 0.0570264 0.923580
## 18: 3442.0412: 0.0592159 0.0168951 0.0571747 0.924779
## 19: 3441.8973: 0.0629690 0.0176636 0.0569883 0.923946
## 20: 3441.5045: 0.0633577 0.0146793 0.0495710 0.934291
## 21: 3441.4940: 0.0641450 0.0144963 0.0496029 0.934328
## 22: 3441.4617: 0.0649289 0.0140593 0.0491190 0.935585
## 23: 3441.4415: 0.0668558 0.0141130 0.0487637 0.935818
## 24: 3441.4373: 0.0677620 0.0137861 0.0482411 0.936790
## 25: 3441.4372: 0.0677649 0.0138499 0.0483500 0.936601
## 26: 3441.4372: 0.0677490 0.0138446 0.0483356 0.936620
## 27: 3441.4372: 0.0677511 0.0138447 0.0483369 0.936619
##
## Final Estimate of the Negative LLH:
## LLH: -9504.19 norm LLH: -3.434835
## mu omega alpha1 beta1
## 6.295576e-04 1.195420e-06 4.833689e-02 9.366186e-01
##
## R-optimhess Difference Approximated Hessian Matrix:
## mu omega alpha1 beta1
## mu -53245293.14 1.873363e+09 -4.010179e+04 -1.608147e+04
## omega 1873363388.34 -1.573222e+14 -4.739689e+09 -7.165152e+09
## alpha1 -40101.79 -4.739689e+09 -2.578999e+05 -2.901997e+05
## beta1 -16081.47 -7.165152e+09 -2.901997e+05 -3.807144e+05
## attr(,"time")
## Time difference of 0.03400207 secs
##
## --- END OF TRACE ---
##
##
## Time to Estimate Parameters:
## Time difference of 0.3610198 secs
The comparison plots for daily values created from the two different risk models are determined below. Each set of data has two plots: - Variance - a plot of variance over time - VaR - a plot of VaR over time
The code below is used to create an x axis value for the time series of the plot. This was created by taking the number of observations of returns in the data and dividing the value by 252 (number of trading days in the year). The next commands read the earliest date value. It takes a substring of the year, and transfers it to a numeric value. The year value is added to the observations as a percentage of the trading year. This creates the time value for the given set of data.
This time, variance and VaR values are added into a single data frame. From here, the values can be plotting using the R package plotly. These plots allow for much more interactive and versatile linear plots than the standard R plots. However, upon using in this .Rmd file, it was found out that it isn’t compatable with .pdf writing. Therefore, the output format was changed to .hmtl.
# graph variance, VaR for main data
time = c(1:nrow(garch_m))
year = substr(garch_m[1,"DATE"],7,10)
year = as.numeric(year)
time = time/252 + year
data = data.frame(garch_m,time)
data$rm_variance = oneday_m$variance
data$rm_VaR = oneday_m$VaR
plot_ly(data, x = ~time, y = ~variance, name = "Garch Model",
type = "scatter", mode = "lines",
line = list(color = 'rgb(205, 12, 24)', width = 1.5)) %>%
add_trace(y = ~rm_variance, name = "RiskMetrics Model",
line = list(color = 'rgb(22, 96, 167)', width = 1.5)) %>%
layout(title = "Main Data Daily Variance",
xaxis = list(title = "Year"),
yaxis = list(title = "Variance"))
The variance of the 100 firms from 2005 to 2010 shows general trends of market volatility. The variance had mild spikes in between 2006 and 2007, but remained relatively stable for the early years of the period. Throughout this stable period, the GARCH model showed more variance than RiskMetrics.
After the second quarter of 2007, the variance had a bump to nearly twice its stable value. This was shortly followed by a sharp spike, representing the 2008 financial crisis. During the greater instability, the RiskMetrics model showed greater variance.
You can also see another spike in mid 2010, indicative of the 100 firms likely being affected by the flash crash in May.
From the comparison, it can be inferred that the RiskMetrics model is more reactive to the market returns. This can likely be attributed to the GARCH model having a smaller coefficient for the squared returns, as well as a constant addition. These both contribute to the GARCH model having a more conservative model of variance that doesn’t have as large an estimate during volatile times.
For the VaR, not much in the plotting code was changed. It still calls the same functions and uses the same time series for the x axis. The code calls the same layout of comparative values.
plot_ly(data, x = ~time, y = ~VaR, name = "VaR from Garch",
type = "scatter", mode = "lines",
line = list(color = 'rgb(205, 12, 24)', width = 1.5)) %>%
add_trace(y = ~rm_VaR, name = "VaR from RiskMetrics",
line = list(color = 'rgb(22, 96, 167)', width = 1.5)) %>%
layout(title = "Main Data Daily VaR",
xaxis = list(title = "Year"),
yaxis = list(title = "VaR"))
The daily VaR has similar trends to the the previously plotted variance. The time of instability are now greater. The VaR captures the small periods of increased variance, and spikes to reflect the greater potential losses.
The RiskMetrics model shows a smaller VaR during stable periods, but jumps up to the GARCH model estimation shortly after sharp spikes. During the periods of greatest instability, RiskMetrics shows a greater VaR than GARCH. GARCH appears to be less prone to VaR swings as RiskMetrics, but has a higher resting VaR during more stable periods. This is the same relationship between the two models as shown in the variance models.
The same code is used for the plotting of the comparison period. This is functionally identical to the previously listed code, with the only differences being the data frame being formed from the comparison period risk calculations.
# graph variance, VaR for comparison data
time = c(1:nrow(garch_c))
year = substr(garch_c[1,"DATE"],7,10)
year = as.numeric(year)
time = time/252 + year
data = data.frame(garch_c,time)
data$rm_variance = oneday_c$variance
data$rm_VaR = oneday_c$VaR
plot_ly(data, x = ~time, y = ~variance, name = "Garch Model",
type = "scatter", mode = "lines",
line = list(color = 'rgb(205, 12, 24)', width = 1.5)) %>%
add_trace(y = ~rm_variance, name = "RiskMetrics Model",
line = list(color = 'rgb(22, 96, 167)', width = 1.5)) %>%
layout(title = "Comparison Data Daily Variance",
xaxis = list(title = "Year"),
yaxis = list(title = "Variance"))
The comparison period captures half a decade before the main period. You can see the increased volatility a the turn of the century, likely due to excessive speculation in the emerging tech industry and the 9/11 attacks. The GARCH and RiskMetrics models follow similar trends as previously stated above.
The VaR is plotted in a identical fashion to the previus VaR.
plot_ly(data, x = ~time, y = ~VaR, name = "VaR from Garch",
type = "scatter", mode = "lines",
line = list(color = 'rgb(205, 12, 24)', width = 1.5)) %>%
add_trace(y = ~rm_VaR, name = "VaR from RiskMetrics",
line = list(color = 'rgb(22, 96, 167)', width = 1.5)) %>%
layout(title = "Comparison Data Daily VaR",
xaxis = list(title = "Year"),
yaxis = list(title = "VaR"))
For the comparison period VaR, the VaR again appears to be magnified by the trends of the market volatility. The shifts in VaR values appear to be more sensitive to changes than volatility. For the majority of the early 2000’s, there is rarely ever more than 2% at risk. Outside of the early 2000’s panics, the VaR consistently fluctuates about 1%. Similar to the main period, the 2008 financial crisis leads to 5% of the portfolio being at risk during the moments of highest volatility.
The comparison between the two periods shows that both risk models correct themselves over time. The initial estimate is relatively neglible once a few days pass, and the model reaches a predictable path. For each model, the financial crisis trends nearly identically with one another, even though one model starts half a decade earlier.